[fix][backend] restore FailRetry logic to skip completed evaluators by Lysssyo · Pull Request #402 · coze-dev/coze-loop

Lysssyo · 2026-01-22T05:56:27Z

What type of PR is this?

fix: A bug fix

Check the PR title

This PR title match the format: [<type>][<scope>] <description>. For example: [fix][backend] flaky fix
The description of this PR title is user-oriented and clear enough for others to understand.
Add documentation if the current PR requires user awareness at the usage level.
This PR is written in English. PRs not in English will not be reviewed.

(Optional) Translate the PR title into Chinese

(Optional) More detailed description for this PR(en: English/zh: Chinese)

1. Defect 1: Historical Result ID Mapping Error

1.1 Problem Description

In the GetExptItemTurnResults method, when constructing the mapping table for historical results, the EvaluatorVersionID (Evaluator Version ID) was incorrectly assigned as the Value. The expected Value should have been the EvaluatorResultID (Primary Key ID of the evaluation record).

1.2 Fix Details

File: backend/modules/evaluation/domain/service/expt_result_impl.go
Method: GetExptItemTurnResults

Code Comparison:

// Before (Bug)
// Error: Value assigned as VersionID (e.g., 1001)
turnEvaluatorVerIDToResultID[ref.ExptTurnResultID][ref.EvaluatorVersionID] = ref.EvaluatorVersionID

// After (Fix)
// Correct: Value assigned as ResultID (e.g., 74839201)
turnEvaluatorVerIDToResultID[ref.ExptTurnResultID][ref.EvaluatorVersionID] = ref.EvaluatorResultID

1.3 Verification and Impact

Table: expt_turn_result_run_log
Column: evaluator_result_ids
Before Fix: {"EvalVerIDToResID":{"1001":1001}} — Invalid reference pointing to a non-existent record ID.
After Fix: {"EvalVerIDToResID":{"1001":74839201}} — Valid reference correctly pointing to the primary key in the evaluator_record table.

Note: Although the database would eventually "self-heal" by overwriting old data with new data generated from re-running the LLM evaluation if this issue were left unfixed, the primary purpose of the PreEval stage is to reuse old data. If the ID is incorrect here, the Worker cannot retrieve the corresponding historical record using the incorrect ID when initializing the context via e.evaluatorRecordService.BatchGetEvaluatorRecord. Consequently, the checkpoint recovery mechanism would fail at the very first step.

2. Defect 2: Worker Cache Key Mismatch

2.1 Problem Description

Even if the ID mapping is correct, the Worker used RecordID as the Map Key when constructing the context cache. However, the subsequent read logic used EvaluatorVersionID as the Key for lookups. This Key Mismatch resulted in a permanent cache miss.

2.2 Fix Details

File: backend/modules/evaluation/domain/service/expt_run_item_impl.go
Method: buildExptTurnEvalCtx

Code Comparison:

// Before (Bug)
// Error: Using RecordID as Key
recordMap[record.ID] = record 

// After (Fix)
// Correct: Using EvaluatorVersionID as Key, aligning with the read logic
recordMap[record.EvaluatorVersionID] = record

2.3 Verification (Memory Level)

After the fix, the Worker execution flow is as follows:

Context Construction (buildExptTurnEvalCtx):
Correctly constructs the etec.ExptTurnRunResult.EvaluatorResults Map with EvaluatorVersionID as the Key.

Execution (CallEvaluators):
When iterating through evaluators, the cache is successfully hit, allowing completed tasks to be skipped.

// backend/modules/evaluation/domain/service/expt_run_item_turn_impl.go

func (e *DefaultExptTurnEvaluationImpl) CallEvaluators(ctx context.Context, etec *entity.ExptTurnEvalCtx, targetResult *entity.EvalTargetRecord) (map[int64]*entity.EvaluatorRecord, error) {  
    // ...
    for _, evaluatorVersion := range expt.Evaluators {  
        // Verification Point: Lookup using VersionID now successfully retrieves the cache
        existResult := etec.ExptTurnRunResult.GetEvaluatorRecord(evaluatorVersion.GetEvaluatorVersionID())  
  
        if existResult != nil && existResult.Status == entity.EvaluatorRunStatusSuccess {  
            // Successfully skipped!
            evaluatorResults[existResult.ID] = existResult
            continue  
        }  
        // ...
    }
    // ...
}

(Optional) Which issue(s) this PR fixes

Fixes #400

codecov · 2026-01-22T12:44:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

@@            Coverage Diff             @@
##             main     #402      +/-   ##
==========================================
- Coverage   70.00%   70.00%   -0.01%     
==========================================
  Files         619      619              
  Lines       58718    58718              
==========================================
- Hits        41106    41104       -2     
- Misses      14627    14628       +1     
- Partials     2985     2986       +1

Flag	Coverage Δ
unittests	`70.00% <100.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...ules/evaluation/domain/service/expt_result_impl.go	`64.86% <100.00%> (ø)`
...es/evaluation/domain/service/expt_run_item_impl.go	`67.92% <100.00%> (ø)`

... and 2 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4aa7031...55a8d2d. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

lsy357 · 2026-01-23T03:49:27Z

Change merged, and this patch will ship with next version. Thanks for making loop better!

…402) fix(worker): restore FailRetry logic to skip completed evaluators Co-authored-by: Lysssyo <11906167+lysssyo@user.noreply.gitee.com>

fix(worker): restore FailRetry logic to skip completed evaluators

55a8d2d

Lysssyo mentioned this pull request Jan 22, 2026

[BUG] Experiment Checkpoint Recovery Failure in FailRetry Mode #400

Closed

5 tasks

lsy357 approved these changes Jan 23, 2026

View reviewed changes

tpfz approved these changes Jan 23, 2026

View reviewed changes

lsy357 merged commit 9866134 into coze-dev:main Jan 23, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix][backend] restore FailRetry logic to skip completed evaluators#402

[fix][backend] restore FailRetry logic to skip completed evaluators#402
lsy357 merged 1 commit into
coze-dev:mainfrom
Lysssyo:fix/evaluation_experiment_failRetry_checkpoint_recovery

Lysssyo commented Jan 22, 2026

Uh oh!

codecov Bot commented Jan 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

lsy357 commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Lysssyo commented Jan 22, 2026

What type of PR is this?

Check the PR title

(Optional) Translate the PR title into Chinese

(Optional) More detailed description for this PR(en: English/zh: Chinese)

1. Defect 1: Historical Result ID Mapping Error

1.1 Problem Description

1.2 Fix Details

1.3 Verification and Impact

2. Defect 2: Worker Cache Key Mismatch

2.1 Problem Description

2.2 Fix Details

2.3 Verification (Memory Level)

(Optional) Which issue(s) this PR fixes

Uh oh!

codecov Bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

lsy357 commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Jan 22, 2026 •

edited

Loading